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Abstract 

Visual expertise in medicine has been a subject of research since many decades. 
Interestingly, it has been investigated from two little related fields, namely the field 
that focused mainly on the visual search aspects whilst ignoring higher-level cognitive 
processes involved in medical expertise, and the field that mainly focused on these 
higher-level cognitive processes largely ignoring the relevant visual aspects. 
Consequently, both research lines have traditionally used different methodologies. 
Recently, this gap is being increasingly closed and this special issue presents methods 
to investigate visual expertise in medicine from both research lines, namely those 
investigating vision (eye tracking, pupillometry, flash preview moving window 
paradigm), verbalisations, brain activity, and performance measures (ROC analysis, 
gesture coding, expert performance approach). We discuss the benefits and drawbacks 
of each method and suggest directions for future research that could help to unbox the 
black box of visual expertise in medicine. 
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1. Visual expertise in medicine as a research field 

Expertise is known to be highly domain-specific (Chi, 2006). Hence, generalizations from findings 
cannot be made across different domains and often enough not even across tasks. Thus, the organizers of this 
special issue made a very important decision to focus on medical visual expertise. In this way, we can draw 
concrete conclusions from each contributing article to unbox the nature of visual expertise in medicine. The 
topic of visual expertise in medicine has received increasing attention over the past years (Gegenfurtner, 
Lehtinen, & Saljo, 2011; Kok & Jarodzka, 2016; Kok & Jarodzka, 2017; Norman, Coblentz, Brooks, & 
Babcook, 1992; Reingold & Sheridan, 2011; Van der Gijp et al., 2016). This comes as no surprise as most 
medical tasks require some sort of visual input, in the form of a medical image, a tissue sample, or the 
patient him- or herself. As a result, different theoretical models were constructed that describe different 
aspects of medical expertise. Holistic models (for recent descriptions, see Kundel, Nodine, Conant, & 
Weinstein, 2007; Nodine & Mello-Thoms, 2010) describe how experts visually search abnormalities on 
medical images. Another group of theories focus more on cognitive aspects of expertise and concretely 
describe the development of expertise (Boshuizen & Schmidt, 1992; Feltovich, Johnson, Moller, & Swanson, 
1984; Norman, Young, & Brooks, 2007). Decades of research related to this group of models have provided 
us with a thorough idea of how expertise is constituted, however, most often only taking the cognitive 
component into consideration (e.g., ECG studies Gilhooly et al., 1997). Lesgold et al. (1988) tried decades 
ago to marry these two research lines into one unified model. Unfortunately, this model was not further 
developed ever since. Recently we have combined the model of Lesgold et al. (1988) with more recent 
cognitive expertise models (Boshuizen & Schmidt, 2008a) into one model presented in Figure 1 (Jarodzka, 
Boshuizen, & Kirschner, 2012). The fact that these types of models emerged from rather independent 
research fields, resulted in different types of research methodologies they use. For instance, while visual 
search models were often studied with eye tracking data and ROC abnormality detection rates, cognitive 
models were often studied with different forms of verbal data. However, over the past few years these 
differentiations do not hold any more and both research lines learn from each other. The current special issue 
presents these methodologies, irrespective in which research line they were originally used. This provides the 
ground towards a more unified theoretical model of visual expertise in medicine which will ultimately unbox 
the black box of visual expertise in medicine. 
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Figure 1. Theoretical model combining the classical model of Lesgold et al (1988) with recent cognitive 
models of medical expertise (Boshuizen & Schmidt, 2008) as published in Jarodzka, Boshuizen, & Kirschner 
( 2012 ). 


2. Methods presented in the current special issue and the concepts they address 

This special issue brings together diverse methods that all aim at “unboxing the black box” of 
medical expertise from different angles. We chose to structure them according to the concepts they are 
mainly focusing at (Figure 2). Our structure corresponds to the structuring as Gegenfurtner and Van 
Merrienboer (2017) use it in their introduction to the current special issue into activation (brain activity), 
detection (vision), inference (verbalisation), and practice (observations). 
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Figure 2. Methods to capture diverse concepts related to visual expertise presented in the current special 


issue. 

2.1 Vision - the sensory input 

These articles present methods to measure the sensory input of the medical specialist. How can 
measuring sensory input help us understanding (visual) expertise? Efficient information-processing is a 
central part of expertise and its development (Reingold & Sheridan, 2011). On the one hand, an expert is 
able to detect subtle cues and interpret them within a certain context, but s/he also can detect patterns within 
seeming random elements. On the other hand, our information-processing system is not a passive recipient, 
but rather in active search for meaningful information. The theory of Boshuizen and Schmidt (2008b) gives 
indications on how this process unfolds: information elements that enter the cognitive systems of novices 
and intermediates activate nodes within large knowledge networks - depending on their experience, this 
happens more or less efficient. Experts, on the other hand, also begin with a passive reception of information 
elements. These, however, instantly activate one or several illness-scripts, which in turn, guide the further 
active search for information (Jarodzka, Boshuizen, et al., 2012). Hence, the passive intake or the active 
search of (visual) information elements has the potential to reveal crucial aspects of a person’s expertise. The 
current special issue, provides three manuscripts that discuss methodology to do address this aspect of 
expertise. 

2.1.1 Eye tracking 

The idea that medical experts “see more” than untrained individuals do, appears immediately, when 
you see a medical expert “reading” from an X-ray or a mammogram. Hence, obviously medical expertise 
was already very early investigated with eye tracking (for a comprehensive overview, see Reingold & 
Sheridan, 2011). Eye tracking (Holmqvist et al., 2011) entails (1) the apparatus that measures the motion of 
the eye balls in relation to a stimulus, (2) the software that allows to relate parameters derived from the eye 
movements to certain parts of the stimulus (in time or space), and (3) the researchers, who interpret these 
findings within a theoretical framework. The clear benefits of this methodology are that it unobtrusively 
captures unconscious processes directly; it captures all relevant visual input to working memory. On the 
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other hand, eye tracking data are ambiguous, task-dependent, idiosyncractic, and last but not least 
challenging in data collection and analysis. 

The article by Fox and Faulkner-Jones (2017) provides a brief historical overview of eye tracking. 
For a broader view, we would like to refer the reader to the informative and entertaining book by Wade and 
Tatler (2005) on the history of eye tracking. We applaud Fox and Faulkner-Jones for their excellent analysis 
of the different medical tasks and how they should be differently approached by means of eye tracking. We 
fully agree with them, in particular, as it is well-known how task-dependent expertise is and how broad the 
field of medicine is, at the same time. We hope, that this detailed analysis will forgo overly generalized 
statements, such as one these authors surprisingly made themselves “eye-tracking studies across medical 
specialties have suggested that more experienced physicians require fewer fixations, and less time spent on 
areas of interest, [...] than novices.” (p.3). Such statements are not only too reductionistic to reveal 
interesting insights into the nature of expertise, but even worse: they are often enough simply wrong. In 
many medical areas, we find exactly the opposite to be true, namely that experts are looking longer at 
relevant areas of interest (e.g., Balslev et al., 2012). That does not mean that studies finding the one or the 
other were wrong; it means that these findings cannot be generalized, but depend on the exact task and the 
stimuli that were used. 

What many of the here reported eye tracking studies in medical expertise “suffer” from, is, that they 
report eye tracking measures, that are too basic to allow drawing conclusions on the nature of expertise. One 
example are the findings reported from Fox, Law, and Faulkner-Jones (2016) that trainees make more eye 
movements than experts. By itself, this statement comes down to a simple time-on-task difference, that does 
not make use of the potentials that eye tracking as a methodology offers. One example of how to make more 
concrete statements from eye tracking more concrete is a study by Kok, De Bruin, Robben, and Van 
Merrienboer (2012). These authors have investigated how experts, in comparison to medical residents and 
students, visually explored focal and diffused diseases on chest X-rays. Amongst others, they calculated a 
measure that captured, how broadly an image was scrutinized, by calculating the global/local ratio of 
saccades. In this way, the authors showed that images containing a diffuse disease (i.e., a disease that is 
spread all over the lungs and cannot be brought down to one location) were visually examined in a broader 
way, by inducing a higher global/local ration. Similarly, in one of our own studies (Jaarsma, Jarodzka, Nap, 
Van Merrienboer, & Boshuizen, 2014), we have investigated how expert pathologists, pathology residents, 
and medical students visually examine pathological slides. Amongst other, we found that experts and 
residents diagnosed the slides equally well. However, the way they processed the slides differed severely: 
while experts looked immediately to the relevant location and scrutinized it with long fixations, they had 
afterwards time to explore the rest of the slide - with short fixations - for other potential abnormalities. 
Residents on the other hand, took quite some time to detect the relevant location, and they examined it up to 
the end of the trial to verify their diagnosis. Hence, experts had capacities left over in this task, while 
residents were at their maximum. This could have made a difference for more complex cases, for instance, 
with several different diseases in one case. Hence, the potentials of eye tracking can be explored much more 
by going beyond the ‘standard’ eye tracking measures and looking more concretely into the characteristics of 
the task and the stimulus at hand. 

Another important aspect that Fox and Fau lkn er-Jones (2017) point out is the lack in research on 3D 
and dynamic medical images. We fully agree with that, but would like to point towards research not 
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mentioned by these authors, i.e., by Bertram, Helle, Kaakinen, and Svedstrom (2013) on CT images and our 
own research on interactive digital pathology slides (Jaarsma et al., 2016; Jaarsma, Jarodzka, Nap, Van 
Merrienboer, & Boshuizen, 2015; Jaarsma et al., 2014) and on patient-video cases (Balslev et al., 2012). 
Moreover, the authors mention on several occasions the potential eye tracking has for medical education. We 
agree. On that note, we would like to point towards the idea of using eye movements of experts in 
instructional videos (Van Gog, Jarodzka, Scheiter, Gerjets, & Paas, 2009) and its successful application to 
medicine (e.g., Jarodzka, Balslev, et al., 2012). On a final note, it is important to mention that Fox and 
Faulkner-Jones (2017) have discussed many visual search models, but did not make the connection to current 
models on medical expertise and its development (Jarodzka, Boshuizen, et al., 2012). However, as we 
already argue elsewhere (Kok & Jarodzka, 2016), this connection is crucial for theoretical development. 

2.1.2 Pupillometry 

Szulewski, Kelton, and Howes (2017) describe pupillometry as a method to capture cognitive load in 
relation to visual expertise in medicine. This idea is very persuasive as it would allow to unobtrusively 
measure online cognitive processes during medical task performance. Pupillometry is actually the use of one 
very specific measure derived from eye tracking equipment, namely the size of the pupil and how it changes 
over time. We have to keep in mind though, that the main purpose of changes in the size of the pupil, is the 
adaptation of the eyes’ photoreceptors to the lighting conditions to allow for optimal vision (Davson, 2012). 
This is a reflex that everyone can easily observe: look close to a mirror in a brightly lighted room. Now close 
one of your eyes off with your hand. Open this eye after a moment and observe how your two eyes differ: the 
eye that was open all the time - and thus exposed to light - has a small pupil. The other eye, the one that was 
exposed to relative darkness has a larger pupil. This, however, changes very quickly and you can observe 
how your pupil shrinks to adapt your sight to the changed lightning conditions. 

Research has shown that the size of the pupil may also change if lighting conditions are stable. It 
may vary according to the interest a participant shows in a stimulus (Hess & Polt, 1960), their emotional 
state (Vanderhasselt, Remue, Ng, & De Raedt, 2014), the musical chill they experience (Laeng, Eidet, 
Sulutvedt, & Panksepp, 2016) and many other exciting concepts (Loewenfeld, 1999). This measure can even 
be a valid indicator for certain diseases, such as Parkinson’s (Wang, Mclnnis, Brien, Pari, & Munoz, 2016). 
On the other hand, pupillometry is a rather coarse measure that often cannot provide specific predictions 
(failure in predicting sexual orientation: Savin-Williams, Cash, McCormack, & Rieger, 2016). In any case, it 
is important to keep in mind, that all these changes in pupil size are subtle and can be easily overruled by a 
ray of light falling onto the eye. It is therefore important, to keep meticulously equal lightning conditions for 
both eyes over the entire experiment. This not only holds for the laboratory room, but even the stimulus 
presentation screen luminosity has to be kept stable. These are conditions that can be realized in fundamental 
laboratory research, but are difficult to realize in applied research, such as medical education. Hence, we will 
often have to wonder whether the pupil size changes were due to the emotional state of the participants, their 
mental effort etc. or rather the inevitable changes in the lightning falling onto their eyes in this particular 
stimulus or their position in the recording room. 

Szulewski et al. (2017) also discuss these and other severe drawbacks of using pupillometry in real- 
life settings, still they come to an optimistic conclusion that this method would have “a particularly 
promising role in the field of medicine and in the study of physician expertise development”. We would 
rather suggest to address these methodological problems by triangulating pupillometry with other mental 
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effort measures that are less vulnerable in real-life settings, for instance, questionnaires (e.g., Paas, 1992), 
dual-task paradigms (Briinken, Steinbacher, Plass, & Leutner, 2002) or other less vulnerable physiological 
data, such as skin resistance (e.g., Nourbakhsh, Wang, Chen, & Calvo, 2012). 

2.1.3 The flash-preview moving-window (FPMW) paradigm 

The manuscript by Litchfield and Donovan (2017) presents the ‘moving window paradigm’ to 
investigate visual expertise. McConkie and Rayner (1975) introduced this paradigm to investigate the so- 
called perceptual span in reading. Based on the current fixation within a word, a few characters to the left 
and to the right are masked to investigate to which extent this influences reading. The underlying idea is, that 
we move our eyes in one direction when reading a text (often from left to right or vice versa, depending on 
the language), and that the amount of information we can take in towards this direction, without fixating it, 
increases with increasing expertise in reading. As the visual processing of a written text is so clearly 
predefined, we know exactly how our eyes will move on a line. Unsurprisingly, McConkie and Rayner (and 
many more afterwards confirmed and specified this) showed that our perceptual span in reading is skewed to 
the right and indeed depends on our expertise. This is a method widely used and well-established in reading 
research. When looking at medical pictures, however, there is no such clearly predefined gazing direction as 
in reading (e.g., line by line, from left to right). Hence, the methodological set-up is more complicated. 
Typically, in such ‘scene perception’ settings, researchers use a Gausian blurring technique to capture the 
size of the perceptual span (i.e., the image is blurred apart from a certain area around the current fixation 
point). The “flash preview moving window paradigm” (Litchfield & Donovan, 2017) shows an alternative 
solution. It combines a method of providing participants only with a short glimpse of an image (‘flash 
preview’: Kundel & Nodine, 1975) and the above described ‘moving window paradigm’. 

The really clever aspect about using this method to investigate visual expertise in medicine, is, that it 
allows estimating (1) to which extend the pre-activation of a schema based on visual input influences, how a 
consecutive visual search is carried out, and (2) the exact size of the perceptual span in relation to expertise 
level. The latter shows that - at least in other domains - the size of the visual span increases with higher 
levels of expertise. More interesting is the first point, though: to which extent, does an initial schema 
activation guide the actual search for information relevant to this schema? Litchfield and Donovan could not 
find strong empirical evidence for this idea (based on the model of Kundel, Nodine, & Toto, 1991). This 
comes as no surprise, when consulting cognitive theories on medical expertise (as summarized in Jarodzka, 
Boshuizen, et al., 2012). These theories assume that medical experts activate a set of schemata that are - 
partly -instantiated, tested and often discarded. Hence, from these theories we would assume that this is 
rather an ongoing process than a strictly serial one. What would be very interesting for future research, is to 
take these cognitive theories on medical expertise to inform FPMW research in medicine. Considering how a 
medical expert would pursue in the real world aligns well with these cognitive theories (Jarodzka, 
Boshuizen, et al., 2012) and would be very interesting in at least two ways. First, in a real-life situation the 
expert would first receive several background information of the patient, which would activate an illness- 
script (more complex than a schema, see also Figure 1). This activated illness-script would already guide the 
expert in his or her subsequent visual search on the medical image. Interestingly, Litchfield and Donovan 
(2017) did something in this direction within their third experiment, by showing the target word to the 
participant right before the flash-preview. It would be very informative for further theoretical developments 
to extend this approach, by using realistic patient data. Second, in real-life, the task of the medical expert is 
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not to simply state the presence or location of a target, but goes far beyond: providing a diagnosis, requesting 
further examinations of the patient, and finally suggesting a treatment. Including these aspects into FPMW 
experiments, would allow seeing not only a potential influence on the visual search itself, but also on its 
accompanying cognitive processes. 

2.2 Verbalisations - the working memory output 

From a cognitivistic perspective, medical expertise development has been mainly investigated using 
verbal methods; even for those domains, that heavily draw on visual skill (e.g., radiology Lesgold et al., 
1988 and ECG interpretation Gilhooly et al., 1997). At the same time, the visual challenges of those fields 
were not much in focus within this paradigm, as it was felt, that verbal methods lacked the acuity to discern 
and untangle perceptual processes. Van de Wiel’s article (2017) does a great job in showing how verbal 
methods can be used in showing reasoning lines and knowledge application by medical experts, 
intermediates, and novices. It also shows that the validity of verbal methods depends a lot on the vocabulary 
mastered by the participants, the skill of the researcher to identify references to visual qualities (Helle (2017) 
refers to Ericsson’s (2006) “non-verbal thoughts”, captured by briefs labels and referents), and on the 
successfulness of separating perception of features of the image (if necessary by means of additional 
methods such as pointing or drawing) from the interpretation of patterns of features in the protocols. The 
usefulness of verbal methods, stand-alone or in combination with eye-tracking, also depends on a couple of 
other things: the kind of visual information involved, and - not surprisingly - the research question. 

Even when we constrain ourselves to (stacks of) pictures resulting from medical imaging techniques 
(e.g., EEG, ECG, X-ray images, microscopic pathological slides), the differences in the amount of 
information embedded in such pictures are huge. A basic ECG consists of one wiggly line that should show a 
repetitious pattern with several features such as a P, R and T-top and associated Q and S-inflections with 
their specific amplitude and latency. Absence of these features, and irregularities of the pattern may have 
clinical meaning. Importantly, students learn the interpretation of these visual presentations as a combination 
of features of the visual appearance, the associated vocabulary, and the biomedical and clinical 
interpretation. Though visualising complex phenomena, an ECG is a simple line, though more advanced 
equipment can visualise several measurements simultaneously (up to twelve for complex diagnoses). A 
similar analysis applies to EEG, though the number of channels recorded is much higher. The potential 
amount of information in X-rays pictures, fMRI-s, PET-scan and microscopic images is much higher than in 
line graphs. They are (stacks of) 2D, multi-coloured or grayscale pictures that may maximally vary pixel by 
pixel, independent of the colour or the greyness of an adjacent pixels. A final difference is related to the 
nature of certain diseases that may be focal or diffuse. Images of local conditions may show isolated, 
discernible lesions that can be pointed at, but other disease processes only show themselves in the qualities 
of the image (e.g., ‘cloudy’ or ‘milky’; see Kok et al., 2012). These differences in visual qualities of the 
domain under investigation have implications for vocabulary building, and thus for the usefulness of verbal 
reports generated by participants of different expertise levels (see Van de Wiel, 2017), and for foveal 
detection, and thus for the usefulness of eye-tracking (see Helle, 2017). 

It is almost a platitude to state that the research question affects the investigative data collection and 
analysis methods to be used. Yet, the articles by van de Wiel and by Helle forget to problematize the 
assumption that feature extraction should be differentiated from pattern recognition and interpretation (van 
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de Wiel), and whether deep (meaning) coding of think-aloud protocols is better than superficial coding that 
stays close to the exact verbalisations. There are strong indications that visually detecting and interpreting 
relevant visual features among irrelevant ones, goes hand in hand with interpretation, and is guided by a 
person’s expectations. Many interpretative choices in visual information processing seem to take place in the 
early and non-analytic phase (Norman et al., 2007). On the other hand, recognition of even obvious features 
(for instance, a clear-cut jaundice) is a gradual process that is interlaced with the developing hypotheses 
about what might be wrong. Manipulation of these hypotheses can lead to the non-recognition of such 
features (Brooks, LeBlanc, & Norman, 2000; LeBlanc, Brooks, & Norman, 2002; LeBlanc, Dore, Norman, 
& Brooks, 2004; LeBlanc, Norman, & Brooks, 2001). The alleged superficiality of our own analyses of 
verbal protocols turned out to reveal aspects of perception and cognition (Jaarsma et ah, 2014) that we have 
never become aware of in earlier research that made use of deep, semantic analyses (see for instance, 
Boshuizen & Schmidt, 1992). Lack of a professional vocabulary in novice groups seems to be associated 
with a lack of a repertoire of perceptual features in that domain thus hampering perception and interpretation. 
Stated differently, what might be interpreted as ‘poor’ protocols, may be a veridical representation of the 
perceptual and cognitive skills of the participant. A decision pro or con one or the other interpretation cannot 
be made just on procedural features of the way the research method was applied, but requires the theory- 
based assessment of expertise level, image quality and research question. 


2.3 Brain activity - neural activity and blood flow 

Measuring the neural activity of the brain, gives us the most ‘pure’ look inside the brain. It is very 
persuasive to believe that one day we will be able to observe the brain of experts while they are performing a 
task of their domain and observe how their brilliance unfolds under our very eyes. However, we are not quite 
there, yet, and the question is whether we ever will - or ever need to be. As fascinating as fancy new 
visualizations of activities in the brain might be, we have to keep in mind what they represent: increased 
blood flow in some regions of the brain in comparison to others (fMRI: Huettel, Song, & McCarthy, 2008) 
or increased neural activity somewhere in the brain (EEG: Niedermeyer & Da Silva, 2005). Thus, we can 
observe and record some physiological processes either specific over time (EEG) or space (fMRI). What we 
cannot tell, though, is which thoughts these activities represent. We need to keep that in mind, when 
estimating the insights we can derive from such techniques. 

Imagine lying on a narrow stretcher, while your head is tied within a small cage-like object. You are 
asked not to move and are left alone in the room. Then the stretcher begins to move into a tube which makes 
strange noises. First you might feel very scared (many people do!) and later on coming close to fall asleep 
(also quite common). This is what it feels like when you participate in an fMRI scan. From this we can 
immediately tell, that this is an extremely fundamental laboratory study. The reason for this extremely 
restricted position for the participant is that even the slightest movements (even blinks and eye movements) 
can cause neural activity that was not induced by the experimental intervention. That is also the reason for 
the very many repetitive trials the experimenter has to run on one participant (to filter this noise out). The 
situations are similar for other measures of neural activity. 

When talking about visual expertise research, this seems to be an extremely reductionistic approach. 
This has to do with two issues: first, expertise is a result of decades of deliberate practice and only measures 
under representative circumstances (e.g., Ericsson & Lehmann, 1996). Obviously, the scenario described 
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above does not represent any form of visual expertise in nowadays medical expertise. A serious problem is 
that expertise is extremely domain- and task-specific and thus evolves only under the very specific 
circumstances of the very task. Drawing conclusions on expertise from pseudo-expertise tasks (note that 
studying artificial objects even for several sessions does not qualify for the definition of real expertise), will 
never be possible. Second, when measuring neural activity or blood flow, do we really do justice to the 
nature of expertise in its complexity? For these and other more pragmatical reasons (costs of conducting such 
research), most of the research presented in the article by Gegenfurtner, Kok, Van Geel, De Bruin, and 
Sorger (2017) is actually not related to medical expertise. Actually, only three of the presented studies 
investigated medical expertise as they involved real, medical tasks (Fiorio, Cesari, Bresciani, & Tinazzi, 
2010; Melo et al., 2011; Ribas, Rocha, Siqueira Ortega, Freitas de Rocha, & Massad, 2013). Interestingly, 
first studies show activation differences in relation to medical expertise - at least for certain stimuli (Hruska 
et al., 2016). The challenge remains to understand what these found differences actually mean in terms of 
expertise development. Hence, as interesting as these studies are, it is difficult to draw concrete conclusions 
from them already and clearly far more research is needed. 

So, do we argue that such medical expertise research is pointless? On the contrary! But we need to 
be very careful, what it can be used for. We argue that such research on neurological processes can inform 
fundamental research on memory and attention, which in turn can inform cognitive science, which forms a 
basis for (medical) expertise research. On that route, it could even eventually reach educational research. 
What is now urgently needed to make this information flow possible are solid theoretical models that allow 
for these connections between these research fields. 

2.4 Observations - behavioural performance 

One key step in expertise research is to estimate whether participants are ‘real’ experts by checking 
whether their performance systematically exceeds the one of individuals with less expertise (Ericsson & 
Lehmann, 1996; Ericsson & Smith, 1991). This is easier for some domains than it is for others (e.g., chess 
expertise can be clearly defined by the ELO system). For visual expertise in medicine performance estimates 
are not trivial as Krupinski (2017) describes in her article. In this section, we review three articles that 
capture very different aspects of performance related to visual expertise in medicine. 

2.4.1 ROC analysis 

Krupinski (2017) describes the well-established method of Receiver Operating Characteristics 
(ROC) analysis to tackle this issue. This analysis method allows scrutinizing the ability of medical specialists 
or lay persons to detect one abnormality in a medical image in a very detailed manner. This detailed 
statistical analysis allows for clear interpretations of the findings. This article provides a very comprehensive 
and concrete ‘hands-on’ on how to conduct this specific methodology. Such an article is of extreme practical 
value for other researchers. Unfortunately, such publications are still rare, even though more of these would 
be needed. This holds even true for the current special issue; the article by Krupinski (2017) has the highest 
practical value for other researchers interested in the topic of visual expertise in medicine - and not only! 
Many other domains of visual expertise could benefit from this approach, too. As long as the task can be 
boiled down to a binary, exclusive decision. 
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This is also where the drawbacks of the ROC method begin. Even though, it is very well-established 
as a sophisticated statistical method, it is only applicable for a very limited type of task, namely the binary, 
exclusive decision. However, visual expertise in medicine goes far beyond that, as the author mentions 
already in the very beginning of her article. Already the detection itself goes beyond this simplified present 
vs. not-present decision: the medical specialist needs to know WHERE the abnormality is located, whether 
other ones are present as well, etc. More recent types of ROC analyses take these issues into account (LROC, 
JAFROC, etc.) and have been used for many years already. Although it is important to understand ROC first, 
similar articles to this one on more up-to-date versions of ROC would be very important. But still, visual 
expertise in medicine goes beyond the mere detection of abnormalities. It is only the very first step and does 
not capture the entirety of medical expertise performance (e.g., the following diagnostic or treatment 
decision). The next question is thus, whether it is possible to extend these methods to other forms of 
performance as well. 

On a final note, the author explains how the specific ROC curve and its interpretation depend on the 
observer’s background and experience. However, she does not draw the connection to existing and well- 
established theories on medical expertise and its development (for a summarized model of them, see 
Jarodzka, Boshuizen, et al., 2012). For this method to be able to further contribute to more general (medical) 
expertise research, this connection to existing cognitive theories on medical expertise (development) is 
urgently needed and should be the next step in research line. 

2.4.2 Gesturing 

The study by Ivarsson (2017) investigates a different observable aspect of medical (visual) expertise, 
namely gestures. As described above, verbalising visual process is a rather difficult endeavour for which 
gestures can be really helpful. The article by Ivarsson shows very concrete examples that depict exactly this. 
What is important to know is that this study - in contrast to all other articles in this special issue - does not 
deal with the diagnosis of an individual expert. Instead, it acknowledges that a lot of medical practice is 
carried out in groups. This fact already makes this article a unique contribution to the current special issue. 
Ivarsson investigates a communication situation between several medical professionals. In such a scenario, 
communication comes into play, of which non-verbal aspects are a crucial part of, especially in the medical 
profession as this article shows. It might be interesting to know that even though the author mainly addresses 
movements of the limbs, there seems to be another body part involved as well: examples [21] and [22] 
indicate that the professional was not only gesturing with her hands, but also guiding the listeners’ attention 
with her gaze. This is a well-studied and important phenomenon, and following the gazes of others strongly 
guides our attention (Anstis, Mayhew, & Morley, 1969) in particular in conversations (Argyle & Cook, 
1976; Mansfield, Farroni, & Johnson, 2003). Hence, gaze guidance could be explicitly included in such an 
analysis. 

An important question within this research line is what the exact function of the gesture is. In the 
case described in this article, the main function of the gestures used is to establish a common ground 
between the professionals in a discourse. But is this really the sole purpose of these gestures? Are the merely 
communicative or are they an inherent part of a schema? Research in eye tracking has shown, that people 
often make eye movements even when these are not providing any information, such as when looking at a 
bla nk screen or being in the dark (Foulsham et al., 2012; Johansson & Johansson, 2014). In these cases, 
participants automatically move their eyes when remembering a prior encoded scene. If they are forced not 


177 | F L R 


Jarodzka et Boshuizen 



to move their eyes, their recoding performance significantly drops. Hence, these eye movements (also a form 
of body motion) have a functional role in restoring long term memory content. Couldn’t the same be true - at 
least in part - for gestures? 

Ivarsson (2017) chose a rather atypical task for these experts. The situation is very specific and the 
episode rather short. So, what can we learn from that? The purpose of this endeavour can only be hypothesis 
building and further research needs to follow to test these hypotheses. These might, for instance, investigate 
whether the here found gestures are typically used by these professionals? For instance, the ones indicating 
the representations of digital manipulations [7-9], Is there a gesture ‘language’ that professional use? A 
different way to apply such methodology would be to investigate individual professionals. For instance, 
future research could investigate gestures that are part of the clinical routine (e.g., surgery, but also a 
radiologist turning the X-ray upside down or holding it in a particular angle) or are used as preparations for 
the clinical routine (a phenomenon that often can be observed in sports). These analyses could be 
triangulated with other data. In our own research, we have also investigated the interplay between hand 
movements as navigations within a pathological digital slide, eye movements on this slide as well as the 
verbalization about this examination (Jaarsma et al., 2016; Jaarsma et al., 2015; Jaarsma et ah, 2014). This 
approach was also already investigated in a more natural setting with mobile eye tracking, although in the 
non-medical task of tea making (Tatler et ah, 2013). Such a triangulated analysis of eye-hand coordination 
could be a very meaningful addition to the analysis of gestures. 

2.4.3 The expert performance approach 

Williams, Fawver, and Hodges (2017) provide an excellent overview of methodologies of research 
on expertise. They describe three steps, namely (1) developing representative tasks that elicit systematic 
performance differences in individuals of different levels of expertise, (2) process-tracing techniques to study 
processes underlying expert performance and (3) identifying individual characteristics in training or learning 
that lead to the expert level. We would like to add to this list the importance of a thorough definition and 
description of different expertise levels, which is described very concretely in the medical domain by 
Boshuizen and Schmidt (2008b). 

What Williams and colleagues (2017) then mainly focus on, is studying the learning towards 
expertise. In principle, this could be done from two perspectives. On the one hand, one can identify 
successful methods that improve performance towards visual expertise. One example of an instructional 
method to train aspects of visual expertise are eye movement modelling examples (Van Gog et al., 2009). 
These are instructional videos showing how an expert model approaches a task. Therefore, the model 
verbally explains the steps taken in this task. Moreover, the attentional focus of the expert (based on his or 
her eye movements) is overlaid on this video. We have already successfully applied this method in the 
medical domain (Jarodzka, Balslev, et al., 2012). We showed that diagnosing patient-video cases improved 
after such a training not only in terms of performance, but also on the visual processes. Williams and 
colleagues (2017) favour another approach: the investigation of individual learning trajectories to identify 
‘good’ and ‘poor’ learners. Based on such an analysis they argue, not only instruction could be improved, but 
also the identification of future experts might be possible. A long-standing research line of self-regulated 
learning in non-medical professions could be very informative for such a research (Kicken, Brand-Gruwel, 
Van Merrienboer, & Slot, 2009). In any case, Williams et al. - and we fully agree with them - call for two 
important issues: (a) more longitudinal studies to truly understand the development of visual expertise in 
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medicine, as well as (b) a process-tracing approach to identify relevant (and probably heterogeneous) 
processes underlying this development. 


3. Lessons learned and future avenues 

This special issue presented different methods to investigate different aspects of visual expertise in 
medicine emerging from different research lines within this field. To reach the aim of unboxing the black 
box of visual expertise in medicine, we argue that future research in this line, should consider the following 
points: 

• Methodological triangulation: we saw that each of the methods presented in this special issue have 
unique potentials, but also severe drawbacks. To counterbalance these drawbacks, we need to use 
more methodological triangulation of different methods when conducting studies within visual 
expertise in medicine (see also: Gegenfurtner et al., 2016; Kok & Jarodzka, 2016). 

• We need to systematically discuss more on the challenges we face with new methodologies and 
detailed process measures. Unfortunately, traditional empirical articles leave hardly room to do this. 
We thus plead for a forum where such issues could be identified, evaluated and solutions to them 
agreed upon. 

• Several articles in this special issue have shown the importance of more interdisciplinary research 
that combine different research fields, such as medical image perception and scene processing. 

• Finally, as uttered many times throughout this discussion, we need more solid theoretical models that 
allow to form bridges between these different methodologies presented in this special issue (see also: 
Gegenfurtner et al., 2016; Kok & Jarodzka, 2016). 
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